AITopics | secure and trustworthy

Collaborating Authors

secure and trustworthy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Large Language Model Bias Mitigation from the Perspective of Knowledge Editing

Chen, Ruizhe, Li, Yichen, Xiao, Zikai, Liu, Zuozhu

arXiv.org Artificial IntelligenceJun-29-2024

Existing debiasing methods inevitably make unreasonable or undesired predictions as they are designated and evaluated to achieve parity across different social groups but leave aside individual facts, resulting in modified existing knowledge. In this paper, we first establish a new bias mitigation benchmark BiasKE leveraging existing and additional constructed datasets, which systematically assesses debiasing performance by complementary metrics on fairness, specificity, and generalization. Meanwhile, we propose a novel debiasing method, Fairness Stamp (FAST), which enables editable fairness through fine-grained calibration on individual biased knowledge. Comprehensive experiments demonstrate that FAST surpasses state-of-the-art baselines with remarkable debiasing performance while not hampering overall model capability for knowledge preservation, highlighting the prospect of fine-grained debiasing strategies for editable fairness in LLMs. Pre-trained Large Language Models (LLMs) have demonstrated exceptional performance on many tasks (Devlin et al., 2018; Floridi & Chiriatti, 2020; Brown et al., 2020). However, the encoded social stereotypes and human-like biases inevitably cause undesired behaviors when deploying LLMs in practice (Zhao et al., 2019; Navigli et al., 2023; Sheng et al., 2021). Existing approaches to mitigate biases in LLMs are mainly categorized into: (1) Fine-tuning (Zmigrod et al., 2019; Webster et al., 2020; He et al., 2022; Liang et al., 2020; Lauscher et al., 2021), which includes techniques such as re-balanced corpus pre-training, contrastive learning, projection methods, and efficient parameter tuning. However, existing techniques treat social groups as interchangeable (Gallegos et al., 2023) and neutralize protected attributes of different social groups in model inputs or outputs, while ignoring or Furthermore, existing debiasing evaluation metrics mainly focus on the degree of bias, but fail to measure whether the model retains its origin knowledge (Gallegos et al., 2023) of discerning reasonable disparities among different social groups.

arxiv preprint arxiv, knowledge, language model, (13 more...)

arXiv.org Artificial Intelligence

2405.09341

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Afghanistan (0.04)
Europe > Portugal > Porto > Porto (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

The Effect of Model Size on LLM Post-hoc Explainability via LIME

Heyen, Henning, Widdicombe, Amy, Siegel, Noah Y., Perez-Ortiz, Maria, Treleaven, Philip

arXiv.org Artificial IntelligenceMay-8-2024

Large language models (LLMs) are becoming bigger to boost performance. However, little is known about how explainability is affected by this trend. This work explores LIME explanations for DeBERTaV3 models of four different sizes on natural language inference (NLI) and zero-shot classification (ZSC) tasks. We evaluate the explanations based on their faithfulness to the models' internal decision processes and their plausibility, i.e. their agreement with human explanations. The key finding is that increased model size does not correlate with plausibility despite improved model performance, suggesting a misalignment between the LIME explanations and the models' internal processes as model size increases. Our results further suggest limitations regarding faithfulness metrics in NLI contexts.

computational linguistic, explanation, model size, (16 more...)

arXiv.org Artificial Intelligence

2405.05348

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Spain (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(4 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness

Zheng, Danna, Liu, Danyang, Lapata, Mirella, Pan, Jeff Z.

arXiv.org Artificial IntelligenceMay-6-2024

Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications. However, concerns have arisen regarding the trustworthiness of LLMs' outputs, particularly in closed-book question-answering tasks, where non-experts may struggle to identify inaccuracies due to the absence of contextual or ground truth information. This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLM's response aligns with its intrinsic knowledge. Additionally, TrustScore can seamlessly integrate with factchecking methods, which assesses alignment with external knowledge sources. The experimental results show that TrustScore achieves strong correlations with human judgments, surpassing existing reference-free metrics, and achieving results on par with reference-based metrics. Large-scale language models (LLMs) have recently been in the spotlight due to their impressive performance in various NLP tasks, sparking enthusiasm for potential applications (Kaddour et al., 2023; Bubeck et al., 2023). However, a notable concern has emerged regarding the ability of LLMs to generate plausible yet incorrect responses (Tam et al., 2022; Liu et al., 2023; Devaraj et al., 2022), particularly challenging for users without specialized expertise. Consequently, users are often advised to employ LLMs in scenarios where they can confidently assess the information provided.

arxiv preprint arxiv, secure and trustworthy, trustscore, (15 more...)

arXiv.org Artificial Intelligence

2402.12545

Country:

Europe > Spain (0.05)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)
North America > United States > West Virginia (0.04)
(10 more...)

Genre:

Research Report > New Finding (0.48)
Personal > Honors (0.30)

Industry: Government > Regional Government > North America Government > United States Government (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Attacks on Third-Party APIs of Large Language Models

Zhao, Wanru, Khazanchi, Vidit, Xing, Haodi, He, Xuanli, Xu, Qiongkai, Lane, Nicholas Donald

arXiv.org Artificial IntelligenceApr-24-2024

Large language model (LLM) services have recently begun offering a plugin ecosystem to interact with third-party API services. This innovation enhances the capabilities of LLMs, but it also introduces risks, as these plugins developed by various third parties cannot be easily trusted. This paper proposes a new attacking framework to examine security and safety vulnerabilities within LLM platforms that incorporate third-party services. Applying our framework specifically to widely used LLMs, we identify real-world malicious attacks across various domains on third-party APIs that can imperceptibly modify LLM outputs. The paper discusses the unique challenges posed by third-party API integration and offers strategic possibilities to improve the security and safety of LLM ecosystems moving forward. Our code is released at https://github.com/vk0812/Third-Party-Attacks-on-LLMs.

information, llm, secure and trustworthy, (14 more...)

arXiv.org Artificial Intelligence

2404.16891

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)
(6 more...)

Genre: Research Report (0.84)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

How can AI be made more secure and trustworthy? - Help Net Security

#artificialintelligenceDec-21-2021

While we're still debating whether and how long it will take to reach singularity and superintelligence, artificial intelligence is playing an increasingly important role in our everyday lives. Artificial intelligence – most commonly machine learning (ML) – is the process of training algorithms using data, instead of explicitly programming them. Such algorithms are already being used in applications ranging from HR to finance and transport to medicine, and in use cases almost too numerous to mention. The benefits of machine learning are obvious: they enable faster analysis of vastly more data than any human or even groups of humans are capable of. Many ML applications that can surpass human capabilities already exist, such as those designed to play Go and Chess, or detect fraudulent insurance claims.

decision boundary, ml model, secure and trustworthy, (14 more...)

#artificialintelligence

Industry:

Banking & Finance > Insurance (0.55)
Law Enforcement & Public Safety > Fraud (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Global Big Data Conference

#artificialintelligenceJun-29-2020, 18:45:31 GMT

As real-world AI deployments increase, IBM says the contributions can help ensure they're fair, secure and trustworthy. IBM on Monday announced it's donating a series of open-source toolkits designed to help build trusted AI to a Linux Foundation project, the LF AI Foundation. As real-world AI deployments increase, IBM says the contributions can help ensure they're fair, secure and trustworthy. "Donation of these projects to LFAI will further the mission of creating responsible AI-powered technologies and enable the larger community to come forward and co-create these tools under the governance of Linux Foundation," IBM said in a blog post, penned by Todd Moore, Sriram Raghavan and Aleksandra Mojsilovic. Specifically, IBM is contributing the AI Fairness 360 Toolkit, the Adversarial Robustness 360 Toolbox and the AI Explainability 360 Toolkit.

data mining, machine learning, programming language, (10 more...)

#artificialintelligence

Country:

South America > Colombia (0.08)
Oceania > New Zealand (0.06)
Oceania > Australia (0.06)
(9 more...)

Industry: Information Technology (1.00)

Technology:

Information Technology > Software > Programming Languages (0.85)
Information Technology > Artificial Intelligence > Machine Learning (0.81)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.59)
Information Technology > Data Science > Data Mining > Big Data (0.40)

Add feedback